Introduction

Many colleges want to optimize the money they receive from their alumni. In order to do so, they need to identify and predict the salary/unemployment rate of recent graduates based on their education and other various factors. Doing so, they will be able to put more money into those programs to get a larger return on their investments (students).

Business Question:

Where can colleges put money in order to optimize the amount of money they receive from recent graduates?

Analysis Question:

Based on recent graduates and their characteristics/education, what would be their predicted median salary? Would they make over or less than six figures?

Background Information

This data is pulled from the 2012-12 American Community Survey Public Use Microdata Series, and is limited to those users under the age of 28. The general purpose of this code and data is based upon the story [linked phrase] (https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/)

Process Overview

What will we be doing? Methods, techniques, why?

Data Cleaning

A brief look at the raw data can be found below.

## 'data.frame':    172 obs. of  21 variables:
##  $ Rank                : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Major_code          : int  2419 2416 2415 2417 2405 2418 6202 5001 2414 2408 ...
##  $ Major               : chr  "PETROLEUM ENGINEERING" "MINING AND MINERAL ENGINEERING" "METALLURGICAL ENGINEERING" "NAVAL ARCHITECTURE AND MARINE ENGINEERING" ...
##  $ Total               : int  2339 756 856 1258 32260 2573 3777 1792 91227 81527 ...
##  $ Men                 : int  2057 679 725 1123 21239 2200 2110 832 80320 65511 ...
##  $ Women               : int  282 77 131 135 11021 373 1667 960 10907 16016 ...
##  $ Major_category      : chr  "Engineering" "Engineering" "Engineering" "Engineering" ...
##  $ ShareWomen          : num  0.121 0.102 0.153 0.107 0.342 ...
##  $ Sample_size         : int  36 7 3 16 289 17 51 10 1029 631 ...
##  $ Employed            : int  1976 640 648 758 25694 1857 2912 1526 76442 61928 ...
##  $ Full_time           : int  1849 556 558 1069 23170 2038 2924 1085 71298 55450 ...
##  $ Part_time           : int  270 170 133 150 5180 264 296 553 13101 12695 ...
##  $ Full_time_year_round: int  1207 388 340 692 16697 1449 2482 827 54639 41413 ...
##  $ Unemployed          : int  37 85 16 40 1672 400 308 33 4650 3895 ...
##  $ Unemployment_rate   : num  0.0184 0.1172 0.0241 0.0501 0.0611 ...
##  $ Median              : int  110000 75000 73000 70000 65000 65000 62000 62000 60000 60000 ...
##  $ P25th               : int  95000 55000 50000 43000 50000 50000 53000 31500 48000 45000 ...
##  $ P75th               : int  125000 90000 105000 80000 75000 102000 72000 109000 70000 72000 ...
##  $ College_jobs        : int  1534 350 456 529 18314 1142 1768 972 52844 45829 ...
##  $ Non_college_jobs    : int  364 257 176 102 4440 657 314 500 16384 10874 ...
##  $ Low_wage_jobs       : int  193 50 0 0 972 244 259 220 3253 3170 ...
##  - attr(*, "na.action")= 'omit' Named int 22
##   ..- attr(*, "names")= chr "22"

As can be seen above, many of the categories are integer values. Many of these variables can be converted into factor variables in addition to the numerical ones. In addition, the variables Rank, Major Code, and Major can be dropped as the Rank variable highly correlates with the salary variable, and the other two are to specific and cannot be generalized.

majors_added_categorical <- majors_raw %>% mutate(Over.50K = ifelse(Median > 50000, "Over", "Under.Equal"), High.Unemployment = ifelse(Unemployment_rate > 0.5, "High", "Low")) %>% select(-1, -2, -3)

In addition, the categorical variable categories can be compressed in order for more useful data for the analysis.

## 
## Sciences     Arts    Other     STEM 
##       54       30       48       40

In order to do some analysis, all categorical variables need to be one hot encoded, which is done below:

# One Hot Encoded Data
majors_onehot <- one_hot(data.table(majors_factors), cols = c("Major_category", "High.Unemployment"))
# Normal Data
majors <- majors_factors

Exploratory Data Analysis

Before beginning with the analytical part of the exploration, it is beneficial to visualize and summarize the data in order to get a better understanding of the data in its entirety, and with an emphasis on variables you believe to be important for your analysis.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   22000   33000   36000   40077   45000  110000
##                 Total        Men     Women ShareWomen Sample_size  Employed
## Total       1.0000000  0.8780884 0.9447645  0.1429993   0.9455747 0.9962140
## Men         0.8780884  1.0000000 0.6727589 -0.1120136   0.8751756 0.8706047
## Women       0.9447645  0.6727589 1.0000000  0.2978321   0.8626064 0.9440365
## ShareWomen  0.1429993 -0.1120136 0.2978321  1.0000000   0.0974957 0.1475468
## Sample_size 0.9455747  0.8751756 0.8626064  0.0974957   1.0000000 0.9644062
##             Full_time Part_time Full_time_year_round Unemployed
## Total       0.9893392 0.9502684            0.9811118  0.9747684
## Men         0.8935631 0.7515917            0.8924540  0.8694115
## Women       0.9176812 0.9545133            0.9057195  0.9116943
## ShareWomen  0.1202001 0.2122898            0.1125230  0.1212430
## Sample_size 0.9783624 0.8245444            0.9852125  0.9179335
##             Unemployment_rate     Median       P25th       P75th College_jobs
## Total              0.08319170 -0.1067377 -0.07192608 -0.08319767    0.8004648
## Men                0.10150234  0.0259906  0.03872518  0.05239290    0.5631684
## Women              0.05910776 -0.1828419 -0.13773826 -0.16452834    0.8519460
## ShareWomen         0.07320458 -0.6186898 -0.50019863 -0.58693216    0.1955501
## Sample_size        0.06295494 -0.0644750 -0.02442859 -0.05225614    0.7012309
##             Non_college_jobs Low_wage_jobs
## Total              0.9412471     0.9355096
## Men                0.8514998     0.7913360
## Women              0.8721318     0.9044699
## ShareWomen         0.1370066     0.1878496
## Sample_size        0.9153352     0.8601159

Data Vizualization

Model Building Linear Regression

## [1] 172  22
## [1] 121  22
## [1] 26 22
## [1] 25 22
## Classes 'data.table' and 'data.frame':   121 obs. of  21 variables:
##  $ Total                  : int  2339 756 856 2573 3777 91227 81527 41542 15058 14955 ...
##  $ Men                    : int  2057 679 725 2200 2110 80320 65511 33258 12953 8407 ...
##  $ Women                  : int  282 77 131 373 1667 10907 16016 8284 2105 6548 ...
##  $ Major_category_Sciences: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Major_category_Arts    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Major_category_Other   : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ Major_category_STEM    : int  1 1 1 1 0 1 1 1 1 1 ...
##  $ ShareWomen             : num  0.121 0.102 0.153 0.145 0.441 ...
##  $ Sample_size            : int  36 7 3 17 51 1029 631 399 147 79 ...
##  $ Employed               : int  1976 640 648 1857 2912 76442 61928 32506 11391 10047 ...
##  $ Full_time              : int  1849 556 558 2038 2924 71298 55450 30315 11106 9017 ...
##  $ Part_time              : int  270 170 133 264 296 13101 12695 5146 2724 2694 ...
##  $ Full_time_year_round   : int  1207 388 340 1449 2482 54639 41413 23621 8790 5986 ...
##  $ Unemployed             : int  37 85 16 400 308 4650 3895 2275 794 1019 ...
##  $ Unemployment_rate      : num  0.0184 0.1172 0.0241 0.1772 0.0957 ...
##  $ P25th                  : int  95000 55000 50000 50000 53000 48000 45000 45000 42000 36000 ...
##  $ P75th                  : int  125000 90000 105000 102000 72000 70000 72000 75000 70000 70000 ...
##  $ College_jobs           : int  1534 350 456 1142 1768 52844 45829 23694 8184 6439 ...
##  $ Non_college_jobs       : int  364 257 176 657 314 16384 10874 5721 2425 2471 ...
##  $ Low_wage_jobs          : int  193 50 0 244 259 3253 3170 980 372 789 ...
##  $ High.Unemployment_Low  : int  1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, ".internal.selfref")=<externalptr>
## C5.0 
## 
## 121 samples
##  21 predictor
##   2 classes: 'Over', 'Under.Equal' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 109, 109, 108, 110, 109, 110, ... 
## Resampling results across tuning parameters:
## 
##   model  winnow  trials  Accuracy   Kappa    
##   rules  FALSE    1      0.9279604  0.6789588
##   rules  FALSE   10      0.9276224  0.7510563
##   rules  FALSE   20      0.9292890  0.7585563
##   rules   TRUE    1      0.9398019  0.7054988
##   rules   TRUE   10      0.9280070  0.6870446
##   rules   TRUE   20      0.9280070  0.6870446
##   tree   FALSE    1      0.9278089  0.6813726
##   tree   FALSE   10      0.9359557  0.7880563
##   tree   FALSE   20      0.9327506  0.7786667
##   tree    TRUE    1      0.9398019  0.7054988
##   tree    TRUE   10      0.9280070  0.6870446
##   tree    TRUE   20      0.9280070  0.6870446
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were trials = 1, model = rules and winnow
##  = TRUE.

Prediction

## Confusion Matrix and Statistics
## 
##              Actual
## Prediction    Over Under.Equal
##   Over           3           1
##   Under.Equal    1          21
##                                           
##                Accuracy : 0.9231          
##                  95% CI : (0.7487, 0.9905)
##     No Information Rate : 0.8462          
##     P-Value [Acc > NIR] : 0.214           
##                                           
##                   Kappa : 0.7045          
##                                           
##  Mcnemar's Test P-Value : 1.000           
##                                           
##             Sensitivity : 0.7500          
##             Specificity : 0.9545          
##          Pos Pred Value : 0.7500          
##          Neg Pred Value : 0.9545          
##              Prevalence : 0.1538          
##          Detection Rate : 0.1154          
##    Detection Prevalence : 0.1538          
##       Balanced Accuracy : 0.8523          
##                                           
##        'Positive' Class : Over            
## 
# Given a certain values for the other variables predict the Median Salary

Evaluation

## C5.0 variable importance
## 
##   only 20 most important variables shown (out of 21)
## 
##                         Overall
## P75th                   100.000
## P25th                    81.358
## Major_category_STEM      80.507
## ShareWomen                4.235
## College_jobs              0.000
## Employed                  0.000
## Non_college_jobs          0.000
## Full_time                 0.000
## Sample_size               0.000
## Unemployment_rate         0.000
## Total                     0.000
## Men                       0.000
## Part_time                 0.000
## Unemployed                0.000
## Major_category_Arts       0.000
## Low_wage_jobs             0.000
## High.Unemployment_Low     0.000
## Major_category_Sciences   0.000
## Women                     0.000
## Major_category_Other      0.000
## C5.0 
## 
## 121 samples
##  21 predictor
##   2 classes: 'Over', 'Under.Equal' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 109, 109, 108, 110, 109, 110, ... 
## Resampling results across tuning parameters:
## 
##   model  winnow  trials  Accuracy   Kappa    
##   rules  FALSE   20      0.9292890  0.7585563
##   rules  FALSE   30      0.9292890  0.7585563
##   rules  FALSE   40      0.9292890  0.7585563
##   rules   TRUE   20      0.9280070  0.6870446
##   rules   TRUE   30      0.9280070  0.6870446
##   rules   TRUE   40      0.9280070  0.6870446
##   tree   FALSE   20      0.9327506  0.7786667
##   tree   FALSE   30      0.9359557  0.7880563
##   tree   FALSE   40      0.9359557  0.7880563
##   tree    TRUE   20      0.9280070  0.6870446
##   tree    TRUE   30      0.9280070  0.6870446
##   tree    TRUE   40      0.9280070  0.6870446
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were trials = 30, model = tree and winnow
##  = FALSE.
## C5.0 
## 
## 121 samples
##  21 predictor
##   2 classes: 'Over', 'Under.Equal' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 109, 109, 108, 110, 109, 110, ... 
## Resampling results across tuning parameters:
## 
##   model  winnow  trials  Accuracy   Kappa    
##   rules  FALSE    1      0.9279604  0.6789588
##   rules  FALSE   10      0.9276224  0.7510563
##   rules  FALSE   20      0.9292890  0.7585563
##   rules   TRUE    1      0.9398019  0.7054988
##   rules   TRUE   10      0.9280070  0.6870446
##   rules   TRUE   20      0.9280070  0.6870446
##   tree   FALSE    1      0.9278089  0.6813726
##   tree   FALSE   10      0.9359557  0.7880563
##   tree   FALSE   20      0.9327506  0.7786667
##   tree    TRUE    1      0.9398019  0.7054988
##   tree    TRUE   10      0.9280070  0.6870446
##   tree    TRUE   20      0.9280070  0.6870446
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were trials = 1, model = rules and winnow
##  = TRUE.
## Confusion Matrix and Statistics
## 
##              Actual
## Prediction    Over Under.Equal
##   Over           3           1
##   Under.Equal    1          21
##                                           
##                Accuracy : 0.9231          
##                  95% CI : (0.7487, 0.9905)
##     No Information Rate : 0.8462          
##     P-Value [Acc > NIR] : 0.214           
##                                           
##                   Kappa : 0.7045          
##                                           
##  Mcnemar's Test P-Value : 1.000           
##                                           
##             Sensitivity : 0.7500          
##             Specificity : 0.9545          
##          Pos Pred Value : 0.7500          
##          Neg Pred Value : 0.9545          
##              Prevalence : 0.1538          
##          Detection Rate : 0.1154          
##    Detection Prevalence : 0.1538          
##       Balanced Accuracy : 0.8523          
##                                           
##        'Positive' Class : Over            
## 
## Confusion Matrix and Statistics
## 
##              Actual
## Prediction    Over Under.Equal
##   Over           2           2
##   Under.Equal    1          20
##                                           
##                Accuracy : 0.88            
##                  95% CI : (0.6878, 0.9745)
##     No Information Rate : 0.88            
##     P-Value [Acc > NIR] : 0.6475          
##                                           
##                   Kappa : 0.5033          
##                                           
##  Mcnemar's Test P-Value : 1.0000          
##                                           
##             Sensitivity : 0.6667          
##             Specificity : 0.9091          
##          Pos Pred Value : 0.5000          
##          Neg Pred Value : 0.9524          
##              Prevalence : 0.1200          
##          Detection Rate : 0.0800          
##    Detection Prevalence : 0.1600          
##       Balanced Accuracy : 0.7879          
##                                           
##        'Positive' Class : Over            
## 

Model Building Classification Random Forest

# Create combined target variable with (inverse of unemployment * median) categories
combined_target <- majors$Median * (1 - majors$Unemployment_rate) * majors$ShareWomen
majors_combined_target <- data.frame(majors, combined_target)

# view(majors_combined_target)
# Next let's one-hot encode those factor variables/character 
majors_combined_target$combined_target <-ifelse(majors_combined_target$combined_target > 20000,1,0)

#added this a predictor versus replacing the numeric version
(majors_combined_target$combined_target <- cut(majors_combined_target$combined_target,c(-1,0.3953488,1),labels = c(0,1)))
##   [1] 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0
##  [38] 0 1 1 0 0 0 1 0 1 0 1 1 0 1 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0
##  [75] 0 0 1 0 0 1 0 0 0 0 0 1 1 1 0 1 1 0 1 0 1 1 0 1 1 1 1 0 1 1 0 0 0 1 1 0 0
## [112] 0 1 0 1 1 0 1 1 1 0 0 0 1 0 1 1 1 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 0 1
## [149] 0 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
## Levels: 0 1
majors_combined_target$combined_target <- fct_collapse(majors_combined_target$combined_target, "LE.EQ.20K"="0", "G.50K"="1")
majors_combined_target <- majors_combined_target %>%
  mutate(combined_target = factor(combined_target, labels = make.names(levels(combined_target))))

str(majors_combined_target)
## 'data.frame':    172 obs. of  21 variables:
##  $ Total               : int  2339 756 856 1258 32260 2573 3777 1792 91227 81527 ...
##  $ Men                 : int  2057 679 725 1123 21239 2200 2110 832 80320 65511 ...
##  $ Women               : int  282 77 131 135 11021 373 1667 960 10907 16016 ...
##  $ Major_category      : Factor w/ 4 levels "Sciences","Arts",..: 4 4 4 4 4 4 3 1 4 4 ...
##  $ ShareWomen          : num  0.121 0.102 0.153 0.107 0.342 ...
##  $ Sample_size         : int  36 7 3 16 289 17 51 10 1029 631 ...
##  $ Employed            : int  1976 640 648 758 25694 1857 2912 1526 76442 61928 ...
##  $ Full_time           : int  1849 556 558 1069 23170 2038 2924 1085 71298 55450 ...
##  $ Part_time           : int  270 170 133 150 5180 264 296 553 13101 12695 ...
##  $ Full_time_year_round: int  1207 388 340 692 16697 1449 2482 827 54639 41413 ...
##  $ Unemployed          : int  37 85 16 40 1672 400 308 33 4650 3895 ...
##  $ Unemployment_rate   : num  0.0184 0.1172 0.0241 0.0501 0.0611 ...
##  $ Median              : int  110000 75000 73000 70000 65000 65000 62000 62000 60000 60000 ...
##  $ P25th               : int  95000 55000 50000 43000 50000 50000 53000 31500 48000 45000 ...
##  $ P75th               : int  125000 90000 105000 80000 75000 102000 72000 109000 70000 72000 ...
##  $ College_jobs        : int  1534 350 456 529 18314 1142 1768 972 52844 45829 ...
##  $ Non_college_jobs    : int  364 257 176 102 4440 657 314 500 16384 10874 ...
##  $ Low_wage_jobs       : int  193 50 0 0 972 244 259 220 3253 3170 ...
##  $ Over.50K            : Factor w/ 2 levels "Over","Under.Equal": 1 1 1 1 1 1 1 1 1 1 ...
##  $ High.Unemployment   : Factor w/ 1 level "Low": 1 1 1 1 1 1 1 1 1 1 ...
##  $ combined_target     : Factor w/ 2 levels "LE.EQ.20K","G.50K": 1 1 1 1 2 1 2 2 1 1 ...
#Determine the baserate or prevalence for the classifier

(prevalence <- table(majors_combined_target$combined_target)[[2]]/length(majors_combined_target$combined_target))
## [1] 0.3953488
table(majors_combined_target$combined_target)
## 
## LE.EQ.20K     G.50K 
##       104        68
# Split data into Train, Tune, Test
part_index_1 <- caret::createDataPartition(majors_combined_target$combined_target,
                                           times=1,
                                           p = 0.70,
                                           groups=1,
                                           list=FALSE)
train <- majors_combined_target[part_index_1, ]
tune_and_test <- majors_combined_target[-part_index_1, ]
#The we need to use the function again to create the tuning set 
tune_and_test_index <- createDataPartition(tune_and_test$combined_target,
                                           p = .5,
                                           list = FALSE,
                                           times = 1)
tune <- tune_and_test[tune_and_test_index, ]
test <- tune_and_test[-tune_and_test_index, ]
dim(train)
## [1] 121  21
dim(test)
## [1] 25 21
dim(tune)
## [1] 26 21
# these are slightly off because the data set isn't perfectly even
#Calculate the initial mtry level 
mytry_tune <- function(x){
  y <- dim(x)[2]-1
  sqrt(y)
}

mytry_tune(majors_combined_target)
## [1] 4.472136
#Creating an initial random forest model with 500 trees
set.seed(2023)
combined_RF = randomForest(combined_target~.,          #<- Formula: response variable ~ predictors.
                            #   The period means 'use all other variables in the data'.
                            train,     #<- A data frame with the variables to be used.
                            #y = NULL,           #<- A response vector. This is unnecessary because we're specifying a response formula.
                            #subset = NULL,      #<- This is unnecessary because we're using all the rows in the training data set.
                            #xtest = NULL,       #<- This is already defined in the formula by the ".".
                            #ytest = NULL,       #<- This is already defined in the formula by "PREGNANT".
                            ntree = 500,        #<- Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets classified at least a few times.
                            mtry = 4,            #<- Number of variables randomly sampled as candidates at each split. Default number for classification is sqrt(# of variables). Default number for regression is (# of variables / 3).
                            replace = TRUE,      #<- Should sampled data points be replaced.
                            #classwt = NULL,     #<- Priors of the classes. Use this if you want to specify what proportion of the data SHOULD be in each class. This is relevant if your sample data is not completely representative of the actual population 
                            #strata = NULL,      #<- Not necessary for our purpose here.
                            sampsize = 100,      #<- Size of sample to draw each time.
                            nodesize = 5,        #<- Minimum numbers of data points in terminal nodes.
                            #maxnodes = NULL,    #<- Limits the number of maximum splits. 
                            importance = TRUE,   #<- Should importance of predictors be assessed?
                            #localImp = FALSE,   #<- Should casewise importance measure be computed? (Setting this to TRUE will override importance.)
                            proximity = FALSE,    #<- Should a proximity measure between rows be calculated?
                            norm.votes = TRUE,   #<- If TRUE (default), the final result of votes are expressed as fractions. If FALSE, raw vote counts are returned (useful for combining results from different runs).
                            do.trace = TRUE,     #<- If set to TRUE, give a more verbose output as randomForest is run.
                            keep.forest = TRUE,  #<- If set to FALSE, the forest will not be retained in the output object. If xtest is given, defaults to FALSE.
                            keep.inbag = TRUE)   #<- Should an n by ntree matrix be returned that keeps track of which samples are in-bag in which trees? 
## ntree      OOB      1      2
##     1:  29.82% 36.67% 22.22%
##     2:  23.26% 25.49% 20.00%
##     3:  24.75% 23.73% 26.19%
##     4:  21.10% 19.05% 23.91%
##     5:  22.81% 16.18% 32.61%
##     6:  22.88% 18.31% 29.79%
##     7:  20.00% 15.07% 27.66%
##     8:  22.50% 19.18% 27.66%
##     9:  17.36% 10.96% 27.08%
##    10:  19.01% 12.33% 29.17%
##    11:  18.18%  9.59% 31.25%
##    12:  19.83% 10.96% 33.33%
##    13:  20.66% 12.33% 33.33%
##    14:  22.31% 10.96% 39.58%
##    15:  20.66% 10.96% 35.42%
##    16:  21.49% 10.96% 37.50%
##    17:  21.49% 12.33% 35.42%
##    18:  19.83% 12.33% 31.25%
##    19:  20.66% 13.70% 31.25%
##    20:  19.01% 13.70% 27.08%
##    21:  21.49% 15.07% 31.25%
##    22:  19.83% 13.70% 29.17%
##    23:  20.66% 13.70% 31.25%
##    24:  19.83% 13.70% 29.17%
##    25:  18.18% 12.33% 27.08%
##    26:  17.36% 10.96% 27.08%
##    27:  18.18% 12.33% 27.08%
##    28:  18.18% 12.33% 27.08%
##    29:  16.53% 10.96% 25.00%
##    30:  15.70% 10.96% 22.92%
##    31:  16.53% 12.33% 22.92%
##    32:  17.36% 12.33% 25.00%
##    33:  16.53% 10.96% 25.00%
##    34:  17.36% 13.70% 22.92%
##    35:  18.18% 13.70% 25.00%
##    36:  17.36% 10.96% 27.08%
##    37:  18.18% 12.33% 27.08%
##    38:  18.18% 12.33% 27.08%
##    39:  17.36% 12.33% 25.00%
##    40:  18.18% 13.70% 25.00%
##    41:  18.18% 13.70% 25.00%
##    42:  18.18% 13.70% 25.00%
##    43:  18.18% 13.70% 25.00%
##    44:  16.53% 12.33% 22.92%
##    45:  18.18% 12.33% 27.08%
##    46:  18.18% 13.70% 25.00%
##    47:  18.18% 13.70% 25.00%
##    48:  16.53% 12.33% 22.92%
##    49:  17.36% 13.70% 22.92%
##    50:  18.18% 13.70% 25.00%
##    51:  18.18% 13.70% 25.00%
##    52:  19.01% 13.70% 27.08%
##    53:  19.83% 13.70% 29.17%
##    54:  19.83% 13.70% 29.17%
##    55:  18.18% 13.70% 25.00%
##    56:  18.18% 13.70% 25.00%
##    57:  19.01% 13.70% 27.08%
##    58:  18.18% 13.70% 25.00%
##    59:  18.18% 13.70% 25.00%
##    60:  18.18% 13.70% 25.00%
##    61:  17.36% 13.70% 22.92%
##    62:  19.01% 13.70% 27.08%
##    63:  19.01% 13.70% 27.08%
##    64:  18.18% 13.70% 25.00%
##    65:  19.01% 15.07% 25.00%
##    66:  18.18% 13.70% 25.00%
##    67:  17.36% 12.33% 25.00%
##    68:  17.36% 12.33% 25.00%
##    69:  17.36% 12.33% 25.00%
##    70:  19.83% 13.70% 29.17%
##    71:  19.83% 12.33% 31.25%
##    72:  19.01% 12.33% 29.17%
##    73:  18.18% 10.96% 29.17%
##    74:  18.18% 10.96% 29.17%
##    75:  17.36%  9.59% 29.17%
##    76:  17.36%  9.59% 29.17%
##    77:  17.36%  9.59% 29.17%
##    78:  18.18% 10.96% 29.17%
##    79:  18.18% 10.96% 29.17%
##    80:  19.01% 10.96% 31.25%
##    81:  18.18% 10.96% 29.17%
##    82:  17.36%  9.59% 29.17%
##    83:  17.36%  9.59% 29.17%
##    84:  17.36%  9.59% 29.17%
##    85:  18.18% 10.96% 29.17%
##    86:  18.18% 10.96% 29.17%
##    87:  18.18% 10.96% 29.17%
##    88:  17.36% 10.96% 27.08%
##    89:  17.36% 10.96% 27.08%
##    90:  18.18% 12.33% 27.08%
##    91:  17.36% 10.96% 27.08%
##    92:  17.36% 10.96% 27.08%
##    93:  17.36% 10.96% 27.08%
##    94:  17.36% 10.96% 27.08%
##    95:  17.36% 10.96% 27.08%
##    96:  17.36% 10.96% 27.08%
##    97:  17.36% 10.96% 27.08%
##    98:  17.36% 10.96% 27.08%
##    99:  17.36% 10.96% 27.08%
##   100:  17.36% 10.96% 27.08%
##   101:  17.36% 10.96% 27.08%
##   102:  17.36% 10.96% 27.08%
##   103:  17.36% 10.96% 27.08%
##   104:  17.36% 10.96% 27.08%
##   105:  17.36% 10.96% 27.08%
##   106:  17.36% 10.96% 27.08%
##   107:  18.18% 10.96% 29.17%
##   108:  17.36% 10.96% 27.08%
##   109:  19.01% 12.33% 29.17%
##   110:  19.01% 12.33% 29.17%
##   111:  19.01% 12.33% 29.17%
##   112:  19.01% 12.33% 29.17%
##   113:  19.83% 12.33% 31.25%
##   114:  19.83% 12.33% 31.25%
##   115:  18.18% 10.96% 29.17%
##   116:  19.01% 12.33% 29.17%
##   117:  19.01% 12.33% 29.17%
##   118:  19.83% 12.33% 31.25%
##   119:  19.01% 12.33% 29.17%
##   120:  19.83% 12.33% 31.25%
##   121:  19.01% 12.33% 29.17%
##   122:  19.01% 12.33% 29.17%
##   123:  19.83% 13.70% 29.17%
##   124:  19.01% 12.33% 29.17%
##   125:  19.01% 12.33% 29.17%
##   126:  19.01% 12.33% 29.17%
##   127:  19.01% 12.33% 29.17%
##   128:  19.01% 12.33% 29.17%
##   129:  19.01% 12.33% 29.17%
##   130:  19.83% 13.70% 29.17%
##   131:  19.83% 13.70% 29.17%
##   132:  19.83% 13.70% 29.17%
##   133:  19.83% 13.70% 29.17%
##   134:  19.83% 13.70% 29.17%
##   135:  19.83% 13.70% 29.17%
##   136:  19.83% 13.70% 29.17%
##   137:  19.83% 13.70% 29.17%
##   138:  19.83% 13.70% 29.17%
##   139:  19.83% 13.70% 29.17%
##   140:  19.83% 13.70% 29.17%
##   141:  19.83% 13.70% 29.17%
##   142:  19.83% 13.70% 29.17%
##   143:  19.01% 12.33% 29.17%
##   144:  19.01% 12.33% 29.17%
##   145:  19.01% 12.33% 29.17%
##   146:  19.01% 12.33% 29.17%
##   147:  19.01% 12.33% 29.17%
##   148:  19.01% 12.33% 29.17%
##   149:  19.01% 12.33% 29.17%
##   150:  19.01% 12.33% 29.17%
##   151:  19.01% 12.33% 29.17%
##   152:  19.01% 12.33% 29.17%
##   153:  18.18% 10.96% 29.17%
##   154:  19.01% 12.33% 29.17%
##   155:  18.18% 10.96% 29.17%
##   156:  18.18% 10.96% 29.17%
##   157:  18.18% 10.96% 29.17%
##   158:  19.01% 12.33% 29.17%
##   159:  18.18% 10.96% 29.17%
##   160:  18.18% 10.96% 29.17%
##   161:  18.18% 10.96% 29.17%
##   162:  18.18% 10.96% 29.17%
##   163:  18.18% 10.96% 29.17%
##   164:  18.18% 10.96% 29.17%
##   165:  19.01% 10.96% 31.25%
##   166:  18.18% 10.96% 29.17%
##   167:  19.01% 10.96% 31.25%
##   168:  18.18% 10.96% 29.17%
##   169:  18.18% 10.96% 29.17%
##   170:  18.18% 10.96% 29.17%
##   171:  18.18% 10.96% 29.17%
##   172:  18.18% 10.96% 29.17%
##   173:  19.01% 10.96% 31.25%
##   174:  19.01% 10.96% 31.25%
##   175:  18.18% 10.96% 29.17%
##   176:  17.36%  9.59% 29.17%
##   177:  17.36%  9.59% 29.17%
##   178:  18.18%  9.59% 31.25%
##   179:  18.18%  9.59% 31.25%
##   180:  18.18%  9.59% 31.25%
##   181:  18.18%  9.59% 31.25%
##   182:  17.36%  9.59% 29.17%
##   183:  18.18%  9.59% 31.25%
##   184:  18.18%  9.59% 31.25%
##   185:  18.18%  9.59% 31.25%
##   186:  18.18%  9.59% 31.25%
##   187:  19.01% 10.96% 31.25%
##   188:  19.01% 10.96% 31.25%
##   189:  19.01% 10.96% 31.25%
##   190:  18.18%  9.59% 31.25%
##   191:  18.18%  9.59% 31.25%
##   192:  18.18%  9.59% 31.25%
##   193:  19.01% 10.96% 31.25%
##   194:  19.01% 10.96% 31.25%
##   195:  19.01% 10.96% 31.25%
##   196:  19.01% 10.96% 31.25%
##   197:  19.01% 10.96% 31.25%
##   198:  19.01% 10.96% 31.25%
##   199:  19.01% 10.96% 31.25%
##   200:  19.01% 10.96% 31.25%
##   201:  19.01% 10.96% 31.25%
##   202:  19.01% 10.96% 31.25%
##   203:  19.01% 10.96% 31.25%
##   204:  19.01% 10.96% 31.25%
##   205:  19.01% 10.96% 31.25%
##   206:  19.01% 10.96% 31.25%
##   207:  19.01% 10.96% 31.25%
##   208:  19.01% 10.96% 31.25%
##   209:  19.83% 10.96% 33.33%
##   210:  19.01% 10.96% 31.25%
##   211:  19.01% 10.96% 31.25%
##   212:  19.83% 10.96% 33.33%
##   213:  19.83% 10.96% 33.33%
##   214:  19.83% 10.96% 33.33%
##   215:  19.01% 10.96% 31.25%
##   216:  19.83% 10.96% 33.33%
##   217:  19.83% 10.96% 33.33%
##   218:  19.83% 10.96% 33.33%
##   219:  19.83% 10.96% 33.33%
##   220:  19.83% 10.96% 33.33%
##   221:  19.83% 10.96% 33.33%
##   222:  19.01%  9.59% 33.33%
##   223:  19.01%  9.59% 33.33%
##   224:  19.01%  9.59% 33.33%
##   225:  19.01%  9.59% 33.33%
##   226:  19.01%  9.59% 33.33%
##   227:  19.01%  9.59% 33.33%
##   228:  19.83% 10.96% 33.33%
##   229:  19.83% 10.96% 33.33%
##   230:  19.01%  9.59% 33.33%
##   231:  19.01%  9.59% 33.33%
##   232:  19.01%  9.59% 33.33%
##   233:  19.01%  9.59% 33.33%
##   234:  19.83% 10.96% 33.33%
##   235:  19.83% 10.96% 33.33%
##   236:  19.83% 10.96% 33.33%
##   237:  19.83% 10.96% 33.33%
##   238:  19.83% 10.96% 33.33%
##   239:  19.83% 10.96% 33.33%
##   240:  19.83% 10.96% 33.33%
##   241:  19.83% 10.96% 33.33%
##   242:  19.83% 10.96% 33.33%
##   243:  19.83% 10.96% 33.33%
##   244:  19.83% 10.96% 33.33%
##   245:  19.83% 10.96% 33.33%
##   246:  19.83% 10.96% 33.33%
##   247:  19.83% 10.96% 33.33%
##   248:  19.83% 10.96% 33.33%
##   249:  19.83% 10.96% 33.33%
##   250:  19.83% 10.96% 33.33%
##   251:  19.83% 10.96% 33.33%
##   252:  19.83% 10.96% 33.33%
##   253:  19.83% 10.96% 33.33%
##   254:  19.83% 10.96% 33.33%
##   255:  19.83% 10.96% 33.33%
##   256:  19.83% 10.96% 33.33%
##   257:  19.01% 10.96% 31.25%
##   258:  19.01% 10.96% 31.25%
##   259:  19.83% 10.96% 33.33%
##   260:  19.83% 10.96% 33.33%
##   261:  19.01% 10.96% 31.25%
##   262:  19.01% 10.96% 31.25%
##   263:  19.01% 10.96% 31.25%
##   264:  19.01% 10.96% 31.25%
##   265:  18.18% 10.96% 29.17%
##   266:  19.01% 10.96% 31.25%
##   267:  19.01% 10.96% 31.25%
##   268:  19.83% 10.96% 33.33%
##   269:  19.83% 10.96% 33.33%
##   270:  19.83% 10.96% 33.33%
##   271:  19.83% 10.96% 33.33%
##   272:  19.01% 10.96% 31.25%
##   273:  19.01% 10.96% 31.25%
##   274:  19.83% 10.96% 33.33%
##   275:  19.01% 10.96% 31.25%
##   276:  19.01% 10.96% 31.25%
##   277:  19.83% 10.96% 33.33%
##   278:  19.01% 10.96% 31.25%
##   279:  19.83% 10.96% 33.33%
##   280:  19.83% 10.96% 33.33%
##   281:  19.83% 10.96% 33.33%
##   282:  19.83% 10.96% 33.33%
##   283:  20.66% 12.33% 33.33%
##   284:  20.66% 12.33% 33.33%
##   285:  21.49% 12.33% 35.42%
##   286:  21.49% 12.33% 35.42%
##   287:  20.66% 10.96% 35.42%
##   288:  20.66% 10.96% 35.42%
##   289:  21.49% 12.33% 35.42%
##   290:  21.49% 12.33% 35.42%
##   291:  20.66% 10.96% 35.42%
##   292:  21.49% 12.33% 35.42%
##   293:  21.49% 12.33% 35.42%
##   294:  21.49% 12.33% 35.42%
##   295:  21.49% 12.33% 35.42%
##   296:  21.49% 12.33% 35.42%
##   297:  21.49% 12.33% 35.42%
##   298:  21.49% 12.33% 35.42%
##   299:  21.49% 12.33% 35.42%
##   300:  20.66% 10.96% 35.42%
##   301:  21.49% 12.33% 35.42%
##   302:  21.49% 12.33% 35.42%
##   303:  21.49% 12.33% 35.42%
##   304:  20.66% 10.96% 35.42%
##   305:  21.49% 12.33% 35.42%
##   306:  20.66% 10.96% 35.42%
##   307:  20.66% 10.96% 35.42%
##   308:  20.66% 10.96% 35.42%
##   309:  20.66% 10.96% 35.42%
##   310:  20.66% 10.96% 35.42%
##   311:  20.66% 10.96% 35.42%
##   312:  21.49% 12.33% 35.42%
##   313:  20.66% 10.96% 35.42%
##   314:  20.66% 10.96% 35.42%
##   315:  21.49% 12.33% 35.42%
##   316:  21.49% 12.33% 35.42%
##   317:  20.66% 10.96% 35.42%
##   318:  20.66% 10.96% 35.42%
##   319:  20.66% 10.96% 35.42%
##   320:  20.66% 10.96% 35.42%
##   321:  20.66% 10.96% 35.42%
##   322:  20.66% 10.96% 35.42%
##   323:  20.66% 10.96% 35.42%
##   324:  20.66% 10.96% 35.42%
##   325:  21.49% 12.33% 35.42%
##   326:  21.49% 12.33% 35.42%
##   327:  21.49% 12.33% 35.42%
##   328:  21.49% 12.33% 35.42%
##   329:  20.66% 10.96% 35.42%
##   330:  21.49% 12.33% 35.42%
##   331:  21.49% 12.33% 35.42%
##   332:  21.49% 12.33% 35.42%
##   333:  21.49% 12.33% 35.42%
##   334:  20.66% 10.96% 35.42%
##   335:  20.66% 10.96% 35.42%
##   336:  20.66% 10.96% 35.42%
##   337:  20.66% 10.96% 35.42%
##   338:  20.66% 10.96% 35.42%
##   339:  20.66% 10.96% 35.42%
##   340:  20.66% 10.96% 35.42%
##   341:  20.66% 10.96% 35.42%
##   342:  20.66% 10.96% 35.42%
##   343:  20.66% 10.96% 35.42%
##   344:  20.66% 10.96% 35.42%
##   345:  20.66% 10.96% 35.42%
##   346:  20.66% 10.96% 35.42%
##   347:  20.66% 10.96% 35.42%
##   348:  20.66% 10.96% 35.42%
##   349:  20.66% 10.96% 35.42%
##   350:  20.66% 10.96% 35.42%
##   351:  20.66% 10.96% 35.42%
##   352:  20.66% 10.96% 35.42%
##   353:  20.66% 10.96% 35.42%
##   354:  20.66% 10.96% 35.42%
##   355:  20.66% 10.96% 35.42%
##   356:  20.66% 10.96% 35.42%
##   357:  20.66% 10.96% 35.42%
##   358:  20.66% 10.96% 35.42%
##   359:  20.66% 10.96% 35.42%
##   360:  20.66% 10.96% 35.42%
##   361:  20.66% 10.96% 35.42%
##   362:  20.66% 10.96% 35.42%
##   363:  20.66% 10.96% 35.42%
##   364:  20.66% 10.96% 35.42%
##   365:  20.66% 10.96% 35.42%
##   366:  20.66% 10.96% 35.42%
##   367:  20.66% 10.96% 35.42%
##   368:  20.66% 10.96% 35.42%
##   369:  20.66% 10.96% 35.42%
##   370:  20.66% 10.96% 35.42%
##   371:  20.66% 10.96% 35.42%
##   372:  20.66% 10.96% 35.42%
##   373:  20.66% 10.96% 35.42%
##   374:  20.66% 10.96% 35.42%
##   375:  20.66% 10.96% 35.42%
##   376:  21.49% 12.33% 35.42%
##   377:  21.49% 12.33% 35.42%
##   378:  21.49% 12.33% 35.42%
##   379:  21.49% 12.33% 35.42%
##   380:  21.49% 12.33% 35.42%
##   381:  21.49% 12.33% 35.42%
##   382:  21.49% 12.33% 35.42%
##   383:  21.49% 12.33% 35.42%
##   384:  21.49% 12.33% 35.42%
##   385:  21.49% 12.33% 35.42%
##   386:  21.49% 12.33% 35.42%
##   387:  20.66% 10.96% 35.42%
##   388:  20.66% 10.96% 35.42%
##   389:  20.66% 10.96% 35.42%
##   390:  20.66% 10.96% 35.42%
##   391:  21.49% 12.33% 35.42%
##   392:  21.49% 12.33% 35.42%
##   393:  21.49% 12.33% 35.42%
##   394:  21.49% 12.33% 35.42%
##   395:  21.49% 12.33% 35.42%
##   396:  20.66% 10.96% 35.42%
##   397:  20.66% 10.96% 35.42%
##   398:  19.83% 10.96% 33.33%
##   399:  20.66% 12.33% 33.33%
##   400:  21.49% 12.33% 35.42%
##   401:  20.66% 10.96% 35.42%
##   402:  20.66% 12.33% 33.33%
##   403:  19.83% 10.96% 33.33%
##   404:  19.83% 10.96% 33.33%
##   405:  19.83% 10.96% 33.33%
##   406:  19.83% 10.96% 33.33%
##   407:  20.66% 12.33% 33.33%
##   408:  19.83% 10.96% 33.33%
##   409:  19.83% 12.33% 31.25%
##   410:  19.83% 10.96% 33.33%
##   411:  19.01% 10.96% 31.25%
##   412:  19.01% 10.96% 31.25%
##   413:  19.83% 10.96% 33.33%
##   414:  19.83% 10.96% 33.33%
##   415:  19.83% 10.96% 33.33%
##   416:  19.83% 10.96% 33.33%
##   417:  19.83% 12.33% 31.25%
##   418:  19.83% 10.96% 33.33%
##   419:  20.66% 12.33% 33.33%
##   420:  20.66% 12.33% 33.33%
##   421:  20.66% 12.33% 33.33%
##   422:  20.66% 12.33% 33.33%
##   423:  20.66% 12.33% 33.33%
##   424:  20.66% 12.33% 33.33%
##   425:  20.66% 12.33% 33.33%
##   426:  20.66% 12.33% 33.33%
##   427:  20.66% 12.33% 33.33%
##   428:  20.66% 12.33% 33.33%
##   429:  20.66% 12.33% 33.33%
##   430:  20.66% 12.33% 33.33%
##   431:  20.66% 12.33% 33.33%
##   432:  19.83% 12.33% 31.25%
##   433:  19.83% 12.33% 31.25%
##   434:  19.83% 12.33% 31.25%
##   435:  19.83% 12.33% 31.25%
##   436:  19.83% 12.33% 31.25%
##   437:  19.83% 12.33% 31.25%
##   438:  19.83% 12.33% 31.25%
##   439:  19.83% 12.33% 31.25%
##   440:  19.83% 12.33% 31.25%
##   441:  19.83% 12.33% 31.25%
##   442:  19.83% 12.33% 31.25%
##   443:  19.83% 12.33% 31.25%
##   444:  19.83% 12.33% 31.25%
##   445:  19.83% 12.33% 31.25%
##   446:  19.83% 12.33% 31.25%
##   447:  20.66% 13.70% 31.25%
##   448:  21.49% 13.70% 33.33%
##   449:  21.49% 13.70% 33.33%
##   450:  21.49% 13.70% 33.33%
##   451:  20.66% 12.33% 33.33%
##   452:  20.66% 12.33% 33.33%
##   453:  20.66% 12.33% 33.33%
##   454:  20.66% 12.33% 33.33%
##   455:  20.66% 12.33% 33.33%
##   456:  20.66% 12.33% 33.33%
##   457:  20.66% 12.33% 33.33%
##   458:  19.83% 12.33% 31.25%
##   459:  20.66% 12.33% 33.33%
##   460:  20.66% 12.33% 33.33%
##   461:  20.66% 12.33% 33.33%
##   462:  20.66% 12.33% 33.33%
##   463:  20.66% 12.33% 33.33%
##   464:  20.66% 12.33% 33.33%
##   465:  20.66% 12.33% 33.33%
##   466:  20.66% 12.33% 33.33%
##   467:  20.66% 12.33% 33.33%
##   468:  20.66% 12.33% 33.33%
##   469:  20.66% 12.33% 33.33%
##   470:  19.83% 12.33% 31.25%
##   471:  20.66% 12.33% 33.33%
##   472:  19.01% 10.96% 31.25%
##   473:  20.66% 12.33% 33.33%
##   474:  19.01% 10.96% 31.25%
##   475:  19.83% 12.33% 31.25%
##   476:  19.83% 12.33% 31.25%
##   477:  19.83% 12.33% 31.25%
##   478:  19.83% 12.33% 31.25%
##   479:  19.83% 12.33% 31.25%
##   480:  19.83% 12.33% 31.25%
##   481:  19.83% 12.33% 31.25%
##   482:  19.83% 12.33% 31.25%
##   483:  19.83% 12.33% 31.25%
##   484:  19.83% 12.33% 31.25%
##   485:  19.83% 12.33% 31.25%
##   486:  19.83% 12.33% 31.25%
##   487:  19.83% 12.33% 31.25%
##   488:  19.83% 12.33% 31.25%
##   489:  19.83% 12.33% 31.25%
##   490:  19.83% 12.33% 31.25%
##   491:  19.83% 12.33% 31.25%
##   492:  19.83% 12.33% 31.25%
##   493:  19.83% 12.33% 31.25%
##   494:  19.83% 12.33% 31.25%
##   495:  19.83% 12.33% 31.25%
##   496:  19.83% 12.33% 31.25%
##   497:  19.83% 12.33% 31.25%
##   498:  19.83% 12.33% 31.25%
##   499:  19.83% 12.33% 31.25%
##   500:  19.83% 12.33% 31.25%
# Look at the output of the random forest.
combined_RF
## 
## Call:
##  randomForest(formula = combined_target ~ ., data = train, ntree = 500,      mtry = 4, replace = TRUE, sampsize = 100, nodesize = 5, importance = TRUE,      proximity = FALSE, norm.votes = TRUE, do.trace = TRUE, keep.forest = TRUE,      keep.inbag = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 4
## 
##         OOB estimate of  error rate: 19.83%
## Confusion matrix:
##           LE.EQ.20K G.50K class.error
## LE.EQ.20K        64     9   0.1232877
## G.50K            15    33   0.3125000
# Determining the number of trees that should be used 

# The "err.rate" argument includes a list of the cumulative error rates
# for each tree, by class and in aggregate for data points not 
# included in the tree (OOB).
# View(as.data.frame(combined_RF$err.rate))

err.rate <- as.data.frame(combined_RF$err.rate)

# View(err.rate)

# The "oob.times" argument includes the number of times that each data point
# is not excluded from trees in the random forest.

# View(as.data.frame(combined_RF$oob.times))

combined_RF_error = data.frame(1:nrow(combined_RF$err.rate),
                                combined_RF$err.rate)
combined_RF_error
##     X1.nrow.combined_RF.err.rate.       OOB  LE.EQ.20K     G.50K
## 1                               1 0.2982456 0.36666667 0.2222222
## 2                               2 0.2325581 0.25490196 0.2000000
## 3                               3 0.2475248 0.23728814 0.2619048
## 4                               4 0.2110092 0.19047619 0.2391304
## 5                               5 0.2280702 0.16176471 0.3260870
## 6                               6 0.2288136 0.18309859 0.2978723
## 7                               7 0.2000000 0.15068493 0.2765957
## 8                               8 0.2250000 0.19178082 0.2765957
## 9                               9 0.1735537 0.10958904 0.2708333
## 10                             10 0.1900826 0.12328767 0.2916667
## 11                             11 0.1818182 0.09589041 0.3125000
## 12                             12 0.1983471 0.10958904 0.3333333
## 13                             13 0.2066116 0.12328767 0.3333333
## 14                             14 0.2231405 0.10958904 0.3958333
## 15                             15 0.2066116 0.10958904 0.3541667
## 16                             16 0.2148760 0.10958904 0.3750000
## 17                             17 0.2148760 0.12328767 0.3541667
## 18                             18 0.1983471 0.12328767 0.3125000
## 19                             19 0.2066116 0.13698630 0.3125000
## 20                             20 0.1900826 0.13698630 0.2708333
## 21                             21 0.2148760 0.15068493 0.3125000
## 22                             22 0.1983471 0.13698630 0.2916667
## 23                             23 0.2066116 0.13698630 0.3125000
## 24                             24 0.1983471 0.13698630 0.2916667
## 25                             25 0.1818182 0.12328767 0.2708333
## 26                             26 0.1735537 0.10958904 0.2708333
## 27                             27 0.1818182 0.12328767 0.2708333
## 28                             28 0.1818182 0.12328767 0.2708333
## 29                             29 0.1652893 0.10958904 0.2500000
## 30                             30 0.1570248 0.10958904 0.2291667
## 31                             31 0.1652893 0.12328767 0.2291667
## 32                             32 0.1735537 0.12328767 0.2500000
## 33                             33 0.1652893 0.10958904 0.2500000
## 34                             34 0.1735537 0.13698630 0.2291667
## 35                             35 0.1818182 0.13698630 0.2500000
## 36                             36 0.1735537 0.10958904 0.2708333
## 37                             37 0.1818182 0.12328767 0.2708333
## 38                             38 0.1818182 0.12328767 0.2708333
## 39                             39 0.1735537 0.12328767 0.2500000
## 40                             40 0.1818182 0.13698630 0.2500000
## 41                             41 0.1818182 0.13698630 0.2500000
## 42                             42 0.1818182 0.13698630 0.2500000
## 43                             43 0.1818182 0.13698630 0.2500000
## 44                             44 0.1652893 0.12328767 0.2291667
## 45                             45 0.1818182 0.12328767 0.2708333
## 46                             46 0.1818182 0.13698630 0.2500000
## 47                             47 0.1818182 0.13698630 0.2500000
## 48                             48 0.1652893 0.12328767 0.2291667
## 49                             49 0.1735537 0.13698630 0.2291667
## 50                             50 0.1818182 0.13698630 0.2500000
## 51                             51 0.1818182 0.13698630 0.2500000
## 52                             52 0.1900826 0.13698630 0.2708333
## 53                             53 0.1983471 0.13698630 0.2916667
## 54                             54 0.1983471 0.13698630 0.2916667
## 55                             55 0.1818182 0.13698630 0.2500000
## 56                             56 0.1818182 0.13698630 0.2500000
## 57                             57 0.1900826 0.13698630 0.2708333
## 58                             58 0.1818182 0.13698630 0.2500000
## 59                             59 0.1818182 0.13698630 0.2500000
## 60                             60 0.1818182 0.13698630 0.2500000
## 61                             61 0.1735537 0.13698630 0.2291667
## 62                             62 0.1900826 0.13698630 0.2708333
## 63                             63 0.1900826 0.13698630 0.2708333
## 64                             64 0.1818182 0.13698630 0.2500000
## 65                             65 0.1900826 0.15068493 0.2500000
## 66                             66 0.1818182 0.13698630 0.2500000
## 67                             67 0.1735537 0.12328767 0.2500000
## 68                             68 0.1735537 0.12328767 0.2500000
## 69                             69 0.1735537 0.12328767 0.2500000
## 70                             70 0.1983471 0.13698630 0.2916667
## 71                             71 0.1983471 0.12328767 0.3125000
## 72                             72 0.1900826 0.12328767 0.2916667
## 73                             73 0.1818182 0.10958904 0.2916667
## 74                             74 0.1818182 0.10958904 0.2916667
## 75                             75 0.1735537 0.09589041 0.2916667
## 76                             76 0.1735537 0.09589041 0.2916667
## 77                             77 0.1735537 0.09589041 0.2916667
## 78                             78 0.1818182 0.10958904 0.2916667
## 79                             79 0.1818182 0.10958904 0.2916667
## 80                             80 0.1900826 0.10958904 0.3125000
## 81                             81 0.1818182 0.10958904 0.2916667
## 82                             82 0.1735537 0.09589041 0.2916667
## 83                             83 0.1735537 0.09589041 0.2916667
## 84                             84 0.1735537 0.09589041 0.2916667
## 85                             85 0.1818182 0.10958904 0.2916667
## 86                             86 0.1818182 0.10958904 0.2916667
## 87                             87 0.1818182 0.10958904 0.2916667
## 88                             88 0.1735537 0.10958904 0.2708333
## 89                             89 0.1735537 0.10958904 0.2708333
## 90                             90 0.1818182 0.12328767 0.2708333
## 91                             91 0.1735537 0.10958904 0.2708333
## 92                             92 0.1735537 0.10958904 0.2708333
## 93                             93 0.1735537 0.10958904 0.2708333
## 94                             94 0.1735537 0.10958904 0.2708333
## 95                             95 0.1735537 0.10958904 0.2708333
## 96                             96 0.1735537 0.10958904 0.2708333
## 97                             97 0.1735537 0.10958904 0.2708333
## 98                             98 0.1735537 0.10958904 0.2708333
## 99                             99 0.1735537 0.10958904 0.2708333
## 100                           100 0.1735537 0.10958904 0.2708333
## 101                           101 0.1735537 0.10958904 0.2708333
## 102                           102 0.1735537 0.10958904 0.2708333
## 103                           103 0.1735537 0.10958904 0.2708333
## 104                           104 0.1735537 0.10958904 0.2708333
## 105                           105 0.1735537 0.10958904 0.2708333
## 106                           106 0.1735537 0.10958904 0.2708333
## 107                           107 0.1818182 0.10958904 0.2916667
## 108                           108 0.1735537 0.10958904 0.2708333
## 109                           109 0.1900826 0.12328767 0.2916667
## 110                           110 0.1900826 0.12328767 0.2916667
## 111                           111 0.1900826 0.12328767 0.2916667
## 112                           112 0.1900826 0.12328767 0.2916667
## 113                           113 0.1983471 0.12328767 0.3125000
## 114                           114 0.1983471 0.12328767 0.3125000
## 115                           115 0.1818182 0.10958904 0.2916667
## 116                           116 0.1900826 0.12328767 0.2916667
## 117                           117 0.1900826 0.12328767 0.2916667
## 118                           118 0.1983471 0.12328767 0.3125000
## 119                           119 0.1900826 0.12328767 0.2916667
## 120                           120 0.1983471 0.12328767 0.3125000
## 121                           121 0.1900826 0.12328767 0.2916667
## 122                           122 0.1900826 0.12328767 0.2916667
## 123                           123 0.1983471 0.13698630 0.2916667
## 124                           124 0.1900826 0.12328767 0.2916667
## 125                           125 0.1900826 0.12328767 0.2916667
## 126                           126 0.1900826 0.12328767 0.2916667
## 127                           127 0.1900826 0.12328767 0.2916667
## 128                           128 0.1900826 0.12328767 0.2916667
## 129                           129 0.1900826 0.12328767 0.2916667
## 130                           130 0.1983471 0.13698630 0.2916667
## 131                           131 0.1983471 0.13698630 0.2916667
## 132                           132 0.1983471 0.13698630 0.2916667
## 133                           133 0.1983471 0.13698630 0.2916667
## 134                           134 0.1983471 0.13698630 0.2916667
## 135                           135 0.1983471 0.13698630 0.2916667
## 136                           136 0.1983471 0.13698630 0.2916667
## 137                           137 0.1983471 0.13698630 0.2916667
## 138                           138 0.1983471 0.13698630 0.2916667
## 139                           139 0.1983471 0.13698630 0.2916667
## 140                           140 0.1983471 0.13698630 0.2916667
## 141                           141 0.1983471 0.13698630 0.2916667
## 142                           142 0.1983471 0.13698630 0.2916667
## 143                           143 0.1900826 0.12328767 0.2916667
## 144                           144 0.1900826 0.12328767 0.2916667
## 145                           145 0.1900826 0.12328767 0.2916667
## 146                           146 0.1900826 0.12328767 0.2916667
## 147                           147 0.1900826 0.12328767 0.2916667
## 148                           148 0.1900826 0.12328767 0.2916667
## 149                           149 0.1900826 0.12328767 0.2916667
## 150                           150 0.1900826 0.12328767 0.2916667
## 151                           151 0.1900826 0.12328767 0.2916667
## 152                           152 0.1900826 0.12328767 0.2916667
## 153                           153 0.1818182 0.10958904 0.2916667
## 154                           154 0.1900826 0.12328767 0.2916667
## 155                           155 0.1818182 0.10958904 0.2916667
## 156                           156 0.1818182 0.10958904 0.2916667
## 157                           157 0.1818182 0.10958904 0.2916667
## 158                           158 0.1900826 0.12328767 0.2916667
## 159                           159 0.1818182 0.10958904 0.2916667
## 160                           160 0.1818182 0.10958904 0.2916667
## 161                           161 0.1818182 0.10958904 0.2916667
## 162                           162 0.1818182 0.10958904 0.2916667
## 163                           163 0.1818182 0.10958904 0.2916667
## 164                           164 0.1818182 0.10958904 0.2916667
## 165                           165 0.1900826 0.10958904 0.3125000
## 166                           166 0.1818182 0.10958904 0.2916667
## 167                           167 0.1900826 0.10958904 0.3125000
## 168                           168 0.1818182 0.10958904 0.2916667
## 169                           169 0.1818182 0.10958904 0.2916667
## 170                           170 0.1818182 0.10958904 0.2916667
## 171                           171 0.1818182 0.10958904 0.2916667
## 172                           172 0.1818182 0.10958904 0.2916667
## 173                           173 0.1900826 0.10958904 0.3125000
## 174                           174 0.1900826 0.10958904 0.3125000
## 175                           175 0.1818182 0.10958904 0.2916667
## 176                           176 0.1735537 0.09589041 0.2916667
## 177                           177 0.1735537 0.09589041 0.2916667
## 178                           178 0.1818182 0.09589041 0.3125000
## 179                           179 0.1818182 0.09589041 0.3125000
## 180                           180 0.1818182 0.09589041 0.3125000
## 181                           181 0.1818182 0.09589041 0.3125000
## 182                           182 0.1735537 0.09589041 0.2916667
## 183                           183 0.1818182 0.09589041 0.3125000
## 184                           184 0.1818182 0.09589041 0.3125000
## 185                           185 0.1818182 0.09589041 0.3125000
## 186                           186 0.1818182 0.09589041 0.3125000
## 187                           187 0.1900826 0.10958904 0.3125000
## 188                           188 0.1900826 0.10958904 0.3125000
## 189                           189 0.1900826 0.10958904 0.3125000
## 190                           190 0.1818182 0.09589041 0.3125000
## 191                           191 0.1818182 0.09589041 0.3125000
## 192                           192 0.1818182 0.09589041 0.3125000
## 193                           193 0.1900826 0.10958904 0.3125000
## 194                           194 0.1900826 0.10958904 0.3125000
## 195                           195 0.1900826 0.10958904 0.3125000
## 196                           196 0.1900826 0.10958904 0.3125000
## 197                           197 0.1900826 0.10958904 0.3125000
## 198                           198 0.1900826 0.10958904 0.3125000
## 199                           199 0.1900826 0.10958904 0.3125000
## 200                           200 0.1900826 0.10958904 0.3125000
## 201                           201 0.1900826 0.10958904 0.3125000
## 202                           202 0.1900826 0.10958904 0.3125000
## 203                           203 0.1900826 0.10958904 0.3125000
## 204                           204 0.1900826 0.10958904 0.3125000
## 205                           205 0.1900826 0.10958904 0.3125000
## 206                           206 0.1900826 0.10958904 0.3125000
## 207                           207 0.1900826 0.10958904 0.3125000
## 208                           208 0.1900826 0.10958904 0.3125000
## 209                           209 0.1983471 0.10958904 0.3333333
## 210                           210 0.1900826 0.10958904 0.3125000
## 211                           211 0.1900826 0.10958904 0.3125000
## 212                           212 0.1983471 0.10958904 0.3333333
## 213                           213 0.1983471 0.10958904 0.3333333
## 214                           214 0.1983471 0.10958904 0.3333333
## 215                           215 0.1900826 0.10958904 0.3125000
## 216                           216 0.1983471 0.10958904 0.3333333
## 217                           217 0.1983471 0.10958904 0.3333333
## 218                           218 0.1983471 0.10958904 0.3333333
## 219                           219 0.1983471 0.10958904 0.3333333
## 220                           220 0.1983471 0.10958904 0.3333333
## 221                           221 0.1983471 0.10958904 0.3333333
## 222                           222 0.1900826 0.09589041 0.3333333
## 223                           223 0.1900826 0.09589041 0.3333333
## 224                           224 0.1900826 0.09589041 0.3333333
## 225                           225 0.1900826 0.09589041 0.3333333
## 226                           226 0.1900826 0.09589041 0.3333333
## 227                           227 0.1900826 0.09589041 0.3333333
## 228                           228 0.1983471 0.10958904 0.3333333
## 229                           229 0.1983471 0.10958904 0.3333333
## 230                           230 0.1900826 0.09589041 0.3333333
## 231                           231 0.1900826 0.09589041 0.3333333
## 232                           232 0.1900826 0.09589041 0.3333333
## 233                           233 0.1900826 0.09589041 0.3333333
## 234                           234 0.1983471 0.10958904 0.3333333
## 235                           235 0.1983471 0.10958904 0.3333333
## 236                           236 0.1983471 0.10958904 0.3333333
## 237                           237 0.1983471 0.10958904 0.3333333
## 238                           238 0.1983471 0.10958904 0.3333333
## 239                           239 0.1983471 0.10958904 0.3333333
## 240                           240 0.1983471 0.10958904 0.3333333
## 241                           241 0.1983471 0.10958904 0.3333333
## 242                           242 0.1983471 0.10958904 0.3333333
## 243                           243 0.1983471 0.10958904 0.3333333
## 244                           244 0.1983471 0.10958904 0.3333333
## 245                           245 0.1983471 0.10958904 0.3333333
## 246                           246 0.1983471 0.10958904 0.3333333
## 247                           247 0.1983471 0.10958904 0.3333333
## 248                           248 0.1983471 0.10958904 0.3333333
## 249                           249 0.1983471 0.10958904 0.3333333
## 250                           250 0.1983471 0.10958904 0.3333333
## 251                           251 0.1983471 0.10958904 0.3333333
## 252                           252 0.1983471 0.10958904 0.3333333
## 253                           253 0.1983471 0.10958904 0.3333333
## 254                           254 0.1983471 0.10958904 0.3333333
## 255                           255 0.1983471 0.10958904 0.3333333
## 256                           256 0.1983471 0.10958904 0.3333333
## 257                           257 0.1900826 0.10958904 0.3125000
## 258                           258 0.1900826 0.10958904 0.3125000
## 259                           259 0.1983471 0.10958904 0.3333333
## 260                           260 0.1983471 0.10958904 0.3333333
## 261                           261 0.1900826 0.10958904 0.3125000
## 262                           262 0.1900826 0.10958904 0.3125000
## 263                           263 0.1900826 0.10958904 0.3125000
## 264                           264 0.1900826 0.10958904 0.3125000
## 265                           265 0.1818182 0.10958904 0.2916667
## 266                           266 0.1900826 0.10958904 0.3125000
## 267                           267 0.1900826 0.10958904 0.3125000
## 268                           268 0.1983471 0.10958904 0.3333333
## 269                           269 0.1983471 0.10958904 0.3333333
## 270                           270 0.1983471 0.10958904 0.3333333
## 271                           271 0.1983471 0.10958904 0.3333333
## 272                           272 0.1900826 0.10958904 0.3125000
## 273                           273 0.1900826 0.10958904 0.3125000
## 274                           274 0.1983471 0.10958904 0.3333333
## 275                           275 0.1900826 0.10958904 0.3125000
## 276                           276 0.1900826 0.10958904 0.3125000
## 277                           277 0.1983471 0.10958904 0.3333333
## 278                           278 0.1900826 0.10958904 0.3125000
## 279                           279 0.1983471 0.10958904 0.3333333
## 280                           280 0.1983471 0.10958904 0.3333333
## 281                           281 0.1983471 0.10958904 0.3333333
## 282                           282 0.1983471 0.10958904 0.3333333
## 283                           283 0.2066116 0.12328767 0.3333333
## 284                           284 0.2066116 0.12328767 0.3333333
## 285                           285 0.2148760 0.12328767 0.3541667
## 286                           286 0.2148760 0.12328767 0.3541667
## 287                           287 0.2066116 0.10958904 0.3541667
## 288                           288 0.2066116 0.10958904 0.3541667
## 289                           289 0.2148760 0.12328767 0.3541667
## 290                           290 0.2148760 0.12328767 0.3541667
## 291                           291 0.2066116 0.10958904 0.3541667
## 292                           292 0.2148760 0.12328767 0.3541667
## 293                           293 0.2148760 0.12328767 0.3541667
## 294                           294 0.2148760 0.12328767 0.3541667
## 295                           295 0.2148760 0.12328767 0.3541667
## 296                           296 0.2148760 0.12328767 0.3541667
## 297                           297 0.2148760 0.12328767 0.3541667
## 298                           298 0.2148760 0.12328767 0.3541667
## 299                           299 0.2148760 0.12328767 0.3541667
## 300                           300 0.2066116 0.10958904 0.3541667
## 301                           301 0.2148760 0.12328767 0.3541667
## 302                           302 0.2148760 0.12328767 0.3541667
## 303                           303 0.2148760 0.12328767 0.3541667
## 304                           304 0.2066116 0.10958904 0.3541667
## 305                           305 0.2148760 0.12328767 0.3541667
## 306                           306 0.2066116 0.10958904 0.3541667
## 307                           307 0.2066116 0.10958904 0.3541667
## 308                           308 0.2066116 0.10958904 0.3541667
## 309                           309 0.2066116 0.10958904 0.3541667
## 310                           310 0.2066116 0.10958904 0.3541667
## 311                           311 0.2066116 0.10958904 0.3541667
## 312                           312 0.2148760 0.12328767 0.3541667
## 313                           313 0.2066116 0.10958904 0.3541667
## 314                           314 0.2066116 0.10958904 0.3541667
## 315                           315 0.2148760 0.12328767 0.3541667
## 316                           316 0.2148760 0.12328767 0.3541667
## 317                           317 0.2066116 0.10958904 0.3541667
## 318                           318 0.2066116 0.10958904 0.3541667
## 319                           319 0.2066116 0.10958904 0.3541667
## 320                           320 0.2066116 0.10958904 0.3541667
## 321                           321 0.2066116 0.10958904 0.3541667
## 322                           322 0.2066116 0.10958904 0.3541667
## 323                           323 0.2066116 0.10958904 0.3541667
## 324                           324 0.2066116 0.10958904 0.3541667
## 325                           325 0.2148760 0.12328767 0.3541667
## 326                           326 0.2148760 0.12328767 0.3541667
## 327                           327 0.2148760 0.12328767 0.3541667
## 328                           328 0.2148760 0.12328767 0.3541667
## 329                           329 0.2066116 0.10958904 0.3541667
## 330                           330 0.2148760 0.12328767 0.3541667
## 331                           331 0.2148760 0.12328767 0.3541667
## 332                           332 0.2148760 0.12328767 0.3541667
## 333                           333 0.2148760 0.12328767 0.3541667
## 334                           334 0.2066116 0.10958904 0.3541667
## 335                           335 0.2066116 0.10958904 0.3541667
## 336                           336 0.2066116 0.10958904 0.3541667
## 337                           337 0.2066116 0.10958904 0.3541667
## 338                           338 0.2066116 0.10958904 0.3541667
## 339                           339 0.2066116 0.10958904 0.3541667
## 340                           340 0.2066116 0.10958904 0.3541667
## 341                           341 0.2066116 0.10958904 0.3541667
## 342                           342 0.2066116 0.10958904 0.3541667
## 343                           343 0.2066116 0.10958904 0.3541667
## 344                           344 0.2066116 0.10958904 0.3541667
## 345                           345 0.2066116 0.10958904 0.3541667
## 346                           346 0.2066116 0.10958904 0.3541667
## 347                           347 0.2066116 0.10958904 0.3541667
## 348                           348 0.2066116 0.10958904 0.3541667
## 349                           349 0.2066116 0.10958904 0.3541667
## 350                           350 0.2066116 0.10958904 0.3541667
## 351                           351 0.2066116 0.10958904 0.3541667
## 352                           352 0.2066116 0.10958904 0.3541667
## 353                           353 0.2066116 0.10958904 0.3541667
## 354                           354 0.2066116 0.10958904 0.3541667
## 355                           355 0.2066116 0.10958904 0.3541667
## 356                           356 0.2066116 0.10958904 0.3541667
## 357                           357 0.2066116 0.10958904 0.3541667
## 358                           358 0.2066116 0.10958904 0.3541667
## 359                           359 0.2066116 0.10958904 0.3541667
## 360                           360 0.2066116 0.10958904 0.3541667
## 361                           361 0.2066116 0.10958904 0.3541667
## 362                           362 0.2066116 0.10958904 0.3541667
## 363                           363 0.2066116 0.10958904 0.3541667
## 364                           364 0.2066116 0.10958904 0.3541667
## 365                           365 0.2066116 0.10958904 0.3541667
## 366                           366 0.2066116 0.10958904 0.3541667
## 367                           367 0.2066116 0.10958904 0.3541667
## 368                           368 0.2066116 0.10958904 0.3541667
## 369                           369 0.2066116 0.10958904 0.3541667
## 370                           370 0.2066116 0.10958904 0.3541667
## 371                           371 0.2066116 0.10958904 0.3541667
## 372                           372 0.2066116 0.10958904 0.3541667
## 373                           373 0.2066116 0.10958904 0.3541667
## 374                           374 0.2066116 0.10958904 0.3541667
## 375                           375 0.2066116 0.10958904 0.3541667
## 376                           376 0.2148760 0.12328767 0.3541667
## 377                           377 0.2148760 0.12328767 0.3541667
## 378                           378 0.2148760 0.12328767 0.3541667
## 379                           379 0.2148760 0.12328767 0.3541667
## 380                           380 0.2148760 0.12328767 0.3541667
## 381                           381 0.2148760 0.12328767 0.3541667
## 382                           382 0.2148760 0.12328767 0.3541667
## 383                           383 0.2148760 0.12328767 0.3541667
## 384                           384 0.2148760 0.12328767 0.3541667
## 385                           385 0.2148760 0.12328767 0.3541667
## 386                           386 0.2148760 0.12328767 0.3541667
## 387                           387 0.2066116 0.10958904 0.3541667
## 388                           388 0.2066116 0.10958904 0.3541667
## 389                           389 0.2066116 0.10958904 0.3541667
## 390                           390 0.2066116 0.10958904 0.3541667
## 391                           391 0.2148760 0.12328767 0.3541667
## 392                           392 0.2148760 0.12328767 0.3541667
## 393                           393 0.2148760 0.12328767 0.3541667
## 394                           394 0.2148760 0.12328767 0.3541667
## 395                           395 0.2148760 0.12328767 0.3541667
## 396                           396 0.2066116 0.10958904 0.3541667
## 397                           397 0.2066116 0.10958904 0.3541667
## 398                           398 0.1983471 0.10958904 0.3333333
## 399                           399 0.2066116 0.12328767 0.3333333
## 400                           400 0.2148760 0.12328767 0.3541667
## 401                           401 0.2066116 0.10958904 0.3541667
## 402                           402 0.2066116 0.12328767 0.3333333
## 403                           403 0.1983471 0.10958904 0.3333333
## 404                           404 0.1983471 0.10958904 0.3333333
## 405                           405 0.1983471 0.10958904 0.3333333
## 406                           406 0.1983471 0.10958904 0.3333333
## 407                           407 0.2066116 0.12328767 0.3333333
## 408                           408 0.1983471 0.10958904 0.3333333
## 409                           409 0.1983471 0.12328767 0.3125000
## 410                           410 0.1983471 0.10958904 0.3333333
## 411                           411 0.1900826 0.10958904 0.3125000
## 412                           412 0.1900826 0.10958904 0.3125000
## 413                           413 0.1983471 0.10958904 0.3333333
## 414                           414 0.1983471 0.10958904 0.3333333
## 415                           415 0.1983471 0.10958904 0.3333333
## 416                           416 0.1983471 0.10958904 0.3333333
## 417                           417 0.1983471 0.12328767 0.3125000
## 418                           418 0.1983471 0.10958904 0.3333333
## 419                           419 0.2066116 0.12328767 0.3333333
## 420                           420 0.2066116 0.12328767 0.3333333
## 421                           421 0.2066116 0.12328767 0.3333333
## 422                           422 0.2066116 0.12328767 0.3333333
## 423                           423 0.2066116 0.12328767 0.3333333
## 424                           424 0.2066116 0.12328767 0.3333333
## 425                           425 0.2066116 0.12328767 0.3333333
## 426                           426 0.2066116 0.12328767 0.3333333
## 427                           427 0.2066116 0.12328767 0.3333333
## 428                           428 0.2066116 0.12328767 0.3333333
## 429                           429 0.2066116 0.12328767 0.3333333
## 430                           430 0.2066116 0.12328767 0.3333333
## 431                           431 0.2066116 0.12328767 0.3333333
## 432                           432 0.1983471 0.12328767 0.3125000
## 433                           433 0.1983471 0.12328767 0.3125000
## 434                           434 0.1983471 0.12328767 0.3125000
## 435                           435 0.1983471 0.12328767 0.3125000
## 436                           436 0.1983471 0.12328767 0.3125000
## 437                           437 0.1983471 0.12328767 0.3125000
## 438                           438 0.1983471 0.12328767 0.3125000
## 439                           439 0.1983471 0.12328767 0.3125000
## 440                           440 0.1983471 0.12328767 0.3125000
## 441                           441 0.1983471 0.12328767 0.3125000
## 442                           442 0.1983471 0.12328767 0.3125000
## 443                           443 0.1983471 0.12328767 0.3125000
## 444                           444 0.1983471 0.12328767 0.3125000
## 445                           445 0.1983471 0.12328767 0.3125000
## 446                           446 0.1983471 0.12328767 0.3125000
## 447                           447 0.2066116 0.13698630 0.3125000
## 448                           448 0.2148760 0.13698630 0.3333333
## 449                           449 0.2148760 0.13698630 0.3333333
## 450                           450 0.2148760 0.13698630 0.3333333
## 451                           451 0.2066116 0.12328767 0.3333333
## 452                           452 0.2066116 0.12328767 0.3333333
## 453                           453 0.2066116 0.12328767 0.3333333
## 454                           454 0.2066116 0.12328767 0.3333333
## 455                           455 0.2066116 0.12328767 0.3333333
## 456                           456 0.2066116 0.12328767 0.3333333
## 457                           457 0.2066116 0.12328767 0.3333333
## 458                           458 0.1983471 0.12328767 0.3125000
## 459                           459 0.2066116 0.12328767 0.3333333
## 460                           460 0.2066116 0.12328767 0.3333333
## 461                           461 0.2066116 0.12328767 0.3333333
## 462                           462 0.2066116 0.12328767 0.3333333
## 463                           463 0.2066116 0.12328767 0.3333333
## 464                           464 0.2066116 0.12328767 0.3333333
## 465                           465 0.2066116 0.12328767 0.3333333
## 466                           466 0.2066116 0.12328767 0.3333333
## 467                           467 0.2066116 0.12328767 0.3333333
## 468                           468 0.2066116 0.12328767 0.3333333
## 469                           469 0.2066116 0.12328767 0.3333333
## 470                           470 0.1983471 0.12328767 0.3125000
## 471                           471 0.2066116 0.12328767 0.3333333
## 472                           472 0.1900826 0.10958904 0.3125000
## 473                           473 0.2066116 0.12328767 0.3333333
## 474                           474 0.1900826 0.10958904 0.3125000
## 475                           475 0.1983471 0.12328767 0.3125000
## 476                           476 0.1983471 0.12328767 0.3125000
## 477                           477 0.1983471 0.12328767 0.3125000
## 478                           478 0.1983471 0.12328767 0.3125000
## 479                           479 0.1983471 0.12328767 0.3125000
## 480                           480 0.1983471 0.12328767 0.3125000
## 481                           481 0.1983471 0.12328767 0.3125000
## 482                           482 0.1983471 0.12328767 0.3125000
## 483                           483 0.1983471 0.12328767 0.3125000
## 484                           484 0.1983471 0.12328767 0.3125000
## 485                           485 0.1983471 0.12328767 0.3125000
## 486                           486 0.1983471 0.12328767 0.3125000
## 487                           487 0.1983471 0.12328767 0.3125000
## 488                           488 0.1983471 0.12328767 0.3125000
## 489                           489 0.1983471 0.12328767 0.3125000
## 490                           490 0.1983471 0.12328767 0.3125000
## 491                           491 0.1983471 0.12328767 0.3125000
## 492                           492 0.1983471 0.12328767 0.3125000
## 493                           493 0.1983471 0.12328767 0.3125000
## 494                           494 0.1983471 0.12328767 0.3125000
## 495                           495 0.1983471 0.12328767 0.3125000
## 496                           496 0.1983471 0.12328767 0.3125000
## 497                           497 0.1983471 0.12328767 0.3125000
## 498                           498 0.1983471 0.12328767 0.3125000
## 499                           499 0.1983471 0.12328767 0.3125000
## 500                           500 0.1983471 0.12328767 0.3125000
colnames(combined_RF_error) = c("Number of Trees", "Out of the Box","<=20K", ">20K")

combined_RF_error$Diff <- combined_RF_error$'>20K'-combined_RF_error$`<=20K`

# View(combined_RF_error)

# 54 Trees should be used because that amount is correlated to the minimum OOB error and >20K value. 
#Determining the right number of variables to randomly sample (the mtry parameter)

str(train)
## 'data.frame':    121 obs. of  21 variables:
##  $ Total               : int  2339 756 1258 32260 3777 1792 91227 81527 15058 14955 ...
##  $ Men                 : int  2057 679 1123 21239 2110 832 80320 65511 12953 8407 ...
##  $ Women               : int  282 77 135 11021 1667 960 10907 16016 2105 6548 ...
##  $ Major_category      : Factor w/ 4 levels "Sciences","Arts",..: 4 4 4 4 3 1 4 4 4 4 ...
##  $ ShareWomen          : num  0.121 0.102 0.107 0.342 0.441 ...
##  $ Sample_size         : int  36 7 16 289 51 10 1029 631 147 79 ...
##  $ Employed            : int  1976 640 758 25694 2912 1526 76442 61928 11391 10047 ...
##  $ Full_time           : int  1849 556 1069 23170 2924 1085 71298 55450 11106 9017 ...
##  $ Part_time           : int  270 170 150 5180 296 553 13101 12695 2724 2694 ...
##  $ Full_time_year_round: int  1207 388 692 16697 2482 827 54639 41413 8790 5986 ...
##  $ Unemployed          : int  37 85 40 1672 308 33 4650 3895 794 1019 ...
##  $ Unemployment_rate   : num  0.0184 0.1172 0.0501 0.0611 0.0957 ...
##  $ Median              : int  110000 75000 70000 65000 62000 62000 60000 60000 60000 60000 ...
##  $ P25th               : int  95000 55000 43000 50000 53000 31500 48000 45000 42000 36000 ...
##  $ P75th               : int  125000 90000 80000 75000 72000 109000 70000 72000 70000 70000 ...
##  $ College_jobs        : int  1534 350 529 18314 1768 972 52844 45829 8184 6439 ...
##  $ Non_college_jobs    : int  364 257 102 4440 314 500 16384 10874 2425 2471 ...
##  $ Low_wage_jobs       : int  193 50 0 972 259 220 3253 3170 372 789 ...
##  $ Over.50K            : Factor w/ 2 levels "Over","Under.Equal": 1 1 1 1 1 1 1 1 1 1 ...
##  $ High.Unemployment   : Factor w/ 1 level "Low": 1 1 1 1 1 1 1 1 1 1 ...
##  $ combined_target     : Factor w/ 2 levels "LE.EQ.20K","G.50K": 1 1 1 2 2 2 1 1 1 2 ...
set.seed(2)
combined_RF_mtry = tuneRF(data.frame(train[ ,1:20]),  #<- data frame of predictor variables
                           (train[ ,21]),              #<- response vector (variables), factors for classification and continuous variable for regression
                           mtryStart = 4,                        #<- starting value of mtry, the default is the same as in the randomForest function
                           ntreeTry = 79,                        #<- number of trees used at the tuning step, let's use the same number as we did for the random forest
                           stepFactor = 2,                       #<- at each iteration, mtry is inflated (or deflated) by this value
                           improve = 0.05,                       #<- the improvement in OOB error must be by this much for the search to continue
                           trace = TRUE,                         #<- whether to print the progress of the search
                           plot = TRUE,                          #<- whether to plot the OOB error as a function of mtry
                           doBest = TRUE)                       #<- whether to create a random forest using the optimal mtry parameter
## mtry = 4  OOB error = 20.66% 
## Searching left ...
## mtry = 2     OOB error = 19.01% 
## 0.08 0.05 
## mtry = 1     OOB error = 29.75% 
## -0.5652174 0.05 
## Searching right ...
## mtry = 8     OOB error = 14.88% 
## 0.2173913 0.05 
## mtry = 16    OOB error = 9.09% 
## 0.3888889 0.05 
## mtry = 20    OOB error = 11.57% 
## -0.2727273 0.05

combined_RF_mtry
## 
## Call:
##  randomForest(x = x, y = y, mtry = res[which.min(res[, 2]), 1]) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 16
## 
##         OOB estimate of  error rate: 12.4%
## Confusion matrix:
##           LE.EQ.20K G.50K class.error
## LE.EQ.20K        65     8   0.1095890
## G.50K             7    41   0.1458333
#Based on the output of the combined_RF_mtry, it looks like 20 variables is the right number of variables to sample becauses it has the least OOB error compared to 2,4,8, and 16. 
# Build Random Forest Classification Model for Combined Category in consideration of the number of trees, the number of variables to sample and the sample size that optimize the model output. 
set.seed(2023)
combined_RF_2 = randomForest(combined_target~.,          #<- Formula: response variable ~ predictors.
                            #   The period means 'use all other variables in the data'.
                            train,     #<- A data frame with the variables to be used.
                            #y = NULL,           #<- A response vector. This is unnecessary because we're specifying a response formula.
                            #subset = NULL,      #<- This is unnecessary because we're using all the rows in the training data set.
                            #xtest = NULL,       #<- This is already defined in the formula by the ".".
                            #ytest = NULL,       #<- This is already defined in the formula by "PREGNANT".
                            ntree = 54,        #<- Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets classified at least a few times.
                            mtry = 20,            #<- Number of variables randomly sampled as candidates at each split. Default number for classification is sqrt(# of variables). Default number for regression is (# of variables / 3).
                            replace = TRUE,      #<- Should sampled data points be replaced.
                            #classwt = NULL,     #<- Priors of the classes. Use this if you want to specify what proportion of the data SHOULD be in each class. This is relevant if your sample data is not completely representative of the actual population 
                            #strata = NULL,      #<- Not necessary for our purpose here.
                            sampsize = 100,      #<- Size of sample to draw each time.
                            nodesize = 5,        #<- Minimum numbers of data points in terminal nodes.
                            #maxnodes = NULL,    #<- Limits the number of maximum splits. 
                            importance = TRUE,   #<- Should importance of predictors be assessed?
                            #localImp = FALSE,   #<- Should casewise importance measure be computed? (Setting this to TRUE will override importance.)
                            proximity = FALSE,    #<- Should a proximity measure between rows be calculated?
                            norm.votes = TRUE,   #<- If TRUE (default), the final result of votes are expressed as fractions. If FALSE, raw vote counts are returned (useful for combining results from different runs).
                            do.trace = TRUE,     #<- If set to TRUE, give a more verbose output as randomForest is run.
                            keep.forest = TRUE,  #<- If set to FALSE, the forest will not be retained in the output object. If xtest is given, defaults to FALSE.
                            keep.inbag = TRUE)   #<- Should an n by ntree matrix be returned that keeps track of which samples are in-bag in which trees? 
## ntree      OOB      1      2
##     1:  26.32% 33.33% 18.52%
##     2:  26.44% 24.49% 28.95%
##     3:  26.47% 28.33% 23.81%
##     4:  22.12% 22.06% 22.22%
##     5:  21.37% 21.43% 21.28%
##     6:  20.17% 19.72% 20.83%
##     7:  21.85% 18.31% 27.08%
##     8:  18.49% 18.31% 18.75%
##     9:  19.83% 17.81% 22.92%
##    10:  20.66% 19.18% 22.92%
##    11:  19.01% 17.81% 20.83%
##    12:  17.36% 15.07% 20.83%
##    13:  19.83% 15.07% 27.08%
##    14:  18.18% 13.70% 25.00%
##    15:  17.36% 15.07% 20.83%
##    16:  15.70% 15.07% 16.67%
##    17:  13.22% 12.33% 14.58%
##    18:  14.05% 13.70% 14.58%
##    19:  16.53% 15.07% 18.75%
##    20:  14.05% 10.96% 18.75%
##    21:  14.88% 12.33% 18.75%
##    22:  17.36% 16.44% 18.75%
##    23:  15.70% 15.07% 16.67%
##    24:  15.70% 13.70% 18.75%
##    25:  15.70% 13.70% 18.75%
##    26:  15.70% 15.07% 16.67%
##    27:  17.36% 15.07% 20.83%
##    28:  16.53% 15.07% 18.75%
##    29:  16.53% 15.07% 18.75%
##    30:  15.70% 16.44% 14.58%
##    31:  16.53% 15.07% 18.75%
##    32:  17.36% 15.07% 20.83%
##    33:  16.53% 13.70% 20.83%
##    34:  14.88% 15.07% 14.58%
##    35:  15.70% 13.70% 18.75%
##    36:  17.36% 16.44% 18.75%
##    37:  18.18% 15.07% 22.92%
##    38:  18.18% 16.44% 20.83%
##    39:  18.18% 16.44% 20.83%
##    40:  16.53% 16.44% 16.67%
##    41:  16.53% 16.44% 16.67%
##    42:  14.88% 17.81% 10.42%
##    43:  14.05% 16.44% 10.42%
##    44:  14.88% 17.81% 10.42%
##    45:  15.70% 17.81% 12.50%
##    46:  14.88% 16.44% 12.50%
##    47:  14.88% 16.44% 12.50%
##    48:  14.88% 16.44% 12.50%
##    49:  15.70% 16.44% 14.58%
##    50:  16.53% 16.44% 16.67%
##    51:  16.53% 16.44% 16.67%
##    52:  17.36% 17.81% 16.67%
##    53:  16.53% 16.44% 16.67%
##    54:  16.53% 16.44% 16.67%
# Look at the output of the random forest.
combined_RF_2
## 
## Call:
##  randomForest(formula = combined_target ~ ., data = train, ntree = 54,      mtry = 20, replace = TRUE, sampsize = 100, nodesize = 5,      importance = TRUE, proximity = FALSE, norm.votes = TRUE,      do.trace = TRUE, keep.forest = TRUE, keep.inbag = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 54
## No. of variables tried at each split: 20
## 
##         OOB estimate of  error rate: 16.53%
## Confusion matrix:
##           LE.EQ.20K G.50K class.error
## LE.EQ.20K        61    12   0.1643836
## G.50K             8    40   0.1666667
#The sample size of the model was kept at the original value of 100, because it was found that this value minimized the class error as much as possible for both classes. When the sample size was increased or decreased, one of the class errors tends to fall, but the other rises significantly. Therefore, this is the best sample size that will minimize class errors, and prevent over or under fitting of the data.

Tuning

Because the built in Random Forest Model was not agreeable with the tuning done with the caret library, an original random forest classification tuning metric was created in order to determine the best values for the three hyperparameters determined above.

# Tune the model
customRF <- list(type = "Classification", library = "randomForest", loop = NULL)
customRF$parameters <- data.frame(parameter = c("mtry", "ntree", "sampsize"), class = rep("numeric", 3), label = c("mtry", "ntree", "sampsize"))
customRF$grid <- function(x, y, len = NULL, search = "grid") {}
customRF$fit <- function(x, y, wts, param, lev, last, weights, classProbs, ...) {
    randomForest(x, y, mtry = param$mtry, ntree=param$ntree, ...)
}
customRF$predict <- function(modelFit, newdata, preProc = NULL, submodels = NULL)
  predict(modelFit, newdata)
customRF$prob <- function(modelFit, newdata, preProc = NULL, submodels = NULL)
  predict(modelFit, newdata, type = "prob")
customRF$sort <- function(x) x[order(x[,1]),]
customRF$levels <- function(x) x$classes

Now, we can set the hyperparameter values to try and tune the model.

##    .mtry .sampsize .ntree
## 1      3        50    200
## 2      4        50    200
## 3      5        50    200
## 4      3       100    200
## 5      4       100    200
## 6      5       100    200
## 7      3       200    200
## 8      4       200    200
## 9      5       200    200
## 10     3        50    300
## 11     4        50    300
## 12     5        50    300
## 13     3       100    300
## 14     4       100    300
## 15     5       100    300
## 16     3       200    300
## 17     4       200    300
## 18     5       200    300
## 19     3        50    400
## 20     4        50    400
## 21     5        50    400
## 22     3       100    400
## 23     4       100    400
## 24     5       100    400
## 25     3       200    400
## 26     4       200    400
## 27     5       200    400
## 121 samples
##  19 predictor
##   2 classes: 'Over', 'Under.Equal' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 5 times) 
## Summary of sample sizes: 97, 97, 97, 96, 97, 97, ... 
## Resampling results across tuning parameters:
## 
##   mtry  sampsize  ntree  ROC        Sens       Spec     
##   3      50       200    0.9903810  0.8533333  1.0000000
##   3      50       300    0.9910159  0.8300000  1.0000000
##   3      50       400    0.9910159  0.8266667  1.0000000
##   3     100       200    0.9913333  0.8500000  1.0000000
##   3     100       300    0.9871429  0.8266667  1.0000000
##   3     100       400    0.9910159  0.8300000  1.0000000
##   3     200       200    0.9910159  0.8300000  1.0000000
##   3     200       300    0.9897460  0.8400000  1.0000000
##   3     200       400    0.9903492  0.8400000  0.9980952
##   4      50       200    0.9913333  0.8633333  1.0000000
##   4      50       300    0.9916508  0.8666667  1.0000000
##   4      50       400    0.9910159  0.8533333  1.0000000
##   4     100       200    0.9909841  0.9000000  1.0000000
##   4     100       300    0.9897143  0.8666667  1.0000000
##   4     100       400    0.9916508  0.8766667  1.0000000
##   4     200       200    0.9906984  0.8766667  1.0000000
##   4     200       300    0.9925873  0.8666667  1.0000000
##   4     200       400    0.9929206  0.8533333  1.0000000
##   5      50       200    0.9916508  0.9233333  1.0000000
##   5      50       300    0.9910159  0.8966667  1.0000000
##   5      50       400    0.9922857  0.8966667  1.0000000
##   5     100       200    0.9903810  0.8966667  1.0000000
##   5     100       300    0.9916508  0.8866667  1.0000000
##   5     100       400    0.9916508  0.9100000  1.0000000
##   5     200       200    0.9922857  0.8866667  1.0000000
##   5     200       300    0.9910159  0.8866667  1.0000000
##   5     200       400    0.9916508  0.8633333  1.0000000
## 
## ROC was used to select the optimal model using the largest value.
## The final values used for the model were mtry = 4, ntree = 400 and sampsize
##  = 200.

Evaluation

# Evaluation of Model

Fairness Assesment

Conclusion

What can you say about the results of the methods section as it relates to your question given the limitations to your model?

Future Recommendations

What additional analysis is needed or what limited your analysis on this project?